異世界的知識推薦：建立 Embeddings Cache 和推薦相似文章

2023 iThome 鐵人賽

DAY 15

AI & Data

關於我轉生變成AI詠唱師這檔事系列第 15 篇

15th鐵人賽

Sam

團隊Quokka In The Cloud

2023-09-30 00:12:35

211 瀏覽

分享至

在這部分，我們將設置一個 cache 來保存我們生成的 embeddings。這是一個好主意，因為這樣我們可以在以後重用它們，避免每次都重新計算。接著，我們將使用這些 embeddings 來找到相似的文章。

# 建立 embeddings 的 cache
embedding_cache_path = "data/recommendations_embeddings_cache.pkl"
try:
    embedding_cache = pd.read_pickle(embedding_cache_path)
except FileNotFoundError:
    embedding_cache = {}

透過這個 cache，我們可以快速地存取已經計算過的 embeddings，大大提高了我們的效率。接著，我們將進一步探討如何利用這些 embeddings 來推薦相似的文章。

# 基於 embeddings 推薦相似文章
def print_recommendations_from_strings(strings: list[str], index_of_source_string: int, k_nearest_neighbors: int = 1, model=EMBEDDING_MODEL) -> list[int]:
    embeddings = [embedding_from_string(string, model=model) for string in strings]
    query_embedding = embeddings[index_of_source_string]
    distances = distances_from_embeddings(query_embedding, embeddings, distance_metric="cosine")
    indices_of_nearest_neighbors = indices_of_nearest_neighbors_from_distances(distances)
    # ... (省略部分程式碼)